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Abstract 



We investigate the performance of the scan (maximum likelihood ratio statistic) and of the 
average likelihood ratio statistic in the problem of detecting a deterministic signal with un- 
known spatial extent in the prototypical univariate sampled data model with white Gaussian 
noise. Our results show that the scan statistic, a popular tool for detection problems, is opti- 
mal only for the detection of signals with the smallest spatial extent. For signals with larger 
CZ) spatial extent the scan is suboptimal, and the power loss can be considerable. In contrast, the 

average likelihood ratio statistic is optimal for the detection of signals on all scales except 
the smallest ones, where its performance is only slightly suboptimal. We give rigorous math- 

■^j- ematical statements of these results as well as heuristic explanations which suggest that the 

CO 

_i- essence of these findings applies to detection problems quite generally, such as the detection 

£■ — of clusters in models involving densities or intensities or the detection of multivariate signals. 

o 

,__! We present a modification of the average likelihood ratio that yields optimal detection of sig- 

nals with arbitrary spatial extent and which has the additional benefit of allowing for a fast 

> 

computation of the statistic. In contrast, optimal detection with the scan seems to require the 

$_j use of scale-dependent critical values. 
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1 Introduction and overview of results 

We are concerned with the problem of detecting a deterministic signal with unknown spatial ex- 
tent against a noisy background. This problem arises in a wide range of applications, e.g. in 
epidemiology and astronomy, and has received considerable attention recently due to important 
problems in e.g. biosurveillance. The standard statistical tool to address this problem is the scan 
statistic (maximum likelihood ratio statistic), which considers the maximum of local likelihood 
ratio statistics on certain subsets of the data. There is a large body of work on scan statistics, see 
e.g. the references in Glaz and Balakrishnan (1999), Glaz et al. (2001), and Glaz et al. (2009). 
But there is also empirical evidence that the scan statistic is suboptimal, see e.g. Neill (2009) or 
Chan (2009). 

Siegmund (2001) and Gangnon and Clayton (2001) propose to use the average of the like- 
lihood ratio statistics instead of their maximum. In different contexts, various versions of the 
average likelihood ratio where considered by Shiryaev (1963), Burnashev and Begmatov (1990), 
and Dumbgen (1998). Chan (2009) and Chan and Zhang (2009) perform simulation studies for 
various detection problems which suggest that the average likelihood ratio statistic is superior to 
the scan statistic. In light of these results, it is of interest to provide a theoretical investigation of 
the performance of both the scan and the average likelihood ratio. Such a theoretical comparison 
seems to be missing in the literature and appears to be quite relevant given the widespread use of 
the scan statistic as a standard tool for a range of detection problems. 

In the first part of this paper we perform such an investigation in the prototypical univariate 
sampled data model with white Gaussian noise and we obtain the following results: 

The scan statistic possesses optimal detection power only for signals with the smallest spatial 
extent. Otherwise the scan statistic is suboptimal, and the loss of power can be considerable for 
signals having a large spatial extent. In the case of the average likelihood ratio (ALR) statistic, 
these conclusions hold in reversed order: The ALR possesses optimal detection power for signals 
having large spatial extent, but is suboptimal for signals with small spatial extent. However, the 
loss of power in the latter case is so small that it is unlikely to be of concern, at least for most 
sample sizes considered today. 

In the second part of the paper we propose a modification of the ALR that results in universal 
optimality and allows efficient computation. The ALR averages the likelihood ratios pertaining 
to ~ n? stretches of the data, where n is the sample size, resulting in an 0(n 2 ) algorithm. Thus 



the use of the ALR is computationally infeasible even for moderate sample sizes. We introduce 
a condensed ALR that averages only a certain subset of the likelihood ratios and we show that 
this condensed ALR possesses optimal detection power for signals having arbitrary spatial extent. 
Furthermore, this condensed ALR can be computed in almost linear time, viz. with an 0(n log 2 n) 
algorithm. In light of the preceding discussion, it is arguably this improvement in computation 
time rather than the small gain in detection power that is the main advantage of this modification. 
We note that typically, an approximation introduced to make a procedure computationally less 
intensive will on the flip side degrade its performance somewhat. It is thus noteworthy that in the 
case of the ALR, our computationally efficient modification will actually lead to an improved (in 
fact: optimal) performance. 

We give sharp theoretical results on the performance of the ALR, the scan statistic, and the 
newly proposed ALR in Sections [2] and [3] Since these results are asymptotic, we complement 
them in Section [4] with a simulation study that illustrates the results. Various modifications to the 
scan have been proposed in the literature in order to improve its detection power. We include two 
such modifications in our simulation study to obtain a more informative comparison with the ALR. 
We summarize our conclusions in Section|5]and defer proofs to Section[6j 

2 Comparison of the scan and the average likelihood ratio 

We observe 

Yi = f n (-J+Zi, i = l,...,n, 

where the Z { are i.i.d. N(0, 1) and f n (x) = /U n l/ n (x) with I n = (£, *£■], < j n < k n < n. 
Both the amplitude fi n and the support I n are unknown. The task is to decide whether a signal is 
present, i.e. whether //„ ^ 0. 

The above sampled data model with Gaussian white noise serves as a prototype for many 
important applications. The heuristics and results we develop below suggest that our conclusions 
will carry over, at least qualitatively, to related detection problems involving multivariate signals, 
non-Gaussian errors, or the detection of clusters in models involving densities or intensities as 
described in Kulldorff (1997). 

The likelihood ratio statistic for testing /j, n = when I n is known is computed as 

exp , where Y n (I n ) := — 7 =^= = / , . 

v 2 ' \/n\I n \ V«n-Jn 



Since I n is unknown, the standard approach is to scan over all intervals I E J" n '■= {{i, z;],0 < 
j < k < n} for the largest likelihood ratio statistic. The resulting scan statistic (maximum likeli- 
hood ratio statistic) is equivalently given by 



M n := max 

0<j<k<n 



n n 



In contrast, the average likelihood ratio statistic (ALR) averages the likelihood ratios over all 
intervals / G J n : 

3=0 k=j+l 

To quantify the performance of these statistics, we look for the smallest value of \fj, n \ that al- 
lows a reliable detection of the signal. It will be explained below that in order to achieve optimality, 
a test must be able to asymptotically detect signals f n with 
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(1) \n n \^\I n \ > -i — , where b n -)• oo. 



/? 




Note that for signals f n on small scales, i.e. with \I n \ — > 0, ([TJ is equivalent to 



(2) 



where e n can go to but not too fast: e n wlog tj-t — > cxd. 

For signals f n on large scales, i.e. with liminf n \I n \ > 0, (|Tj> is equivalent to 

(3) |/i n | > — ^=, where b n — >■ oo. 

Jn 



It is impossible to detect signals with noticeably smaller mean: In the case of signals on small 
scales, a classical argument in the minimax framework (see e.g. Lepski and Tsybakov (2000), 
Diimbgen and Spokoiny (2001), and Dumbgen and Walther (2008)) shows that if '+e n ' is replaced 
by '— €n in Q, then there exists no test that can detect such f n with nontrivial asymptotic power. 
Likewise, a contiguity argument as in Dumbgen and Walther (2008) shows that in the case of large 
scales the condition Q is necessary for any test to be consistent against f n . On the other hand, we 
will exhibit below a test that will detect with asymptotic power 1 signals satisfying ([[]). Thus the 
detection threshold given by ([TJ marks a standard that is attainable but cannot be improved upon. 



We will now examine how the scan and the ALR compare against this standard. 
Theorem 1 Let n n be the (1 — a) quantile of the null distribution of M n . 



1. If\n n \ J\Q > (\/2 + e n ) J 1 ^ with e n VTS^i ->• oo, then P/ n (M n > n n ) -)■ 1. 



2. If\n n \ y/\h\ = (\/2 - e n )\/-^ with e n as above, then lim n P /n (M„ > n n ) < a. 



Thus the detection threshold for the scan is W2-^, irrespective of the spatial extent of the 
signal. Comparing to Q, one sees that the scan is optimal only for signals having the smallest 
spatial extent, i.e. for \I n \ close to -. As an illustration, if \I n \ = rT v , p G (0, 1], then detection 
is possible only if |// n |-^/|/ n | is at least p" 1 ' 2 times larger than the optimal threshold. In the case 
of large scales, comparison with ([3]) shows that this multiplier diverges to infinity, and thus the 
scan suffers from a noticeably inferior performance. These results are illustrated in the simulation 
study in Section [4] and explain the sometimes disappointing performance of the scan observed in 
the literature. 

We note that an alternative way to analyze the performance of the scan is to put a prior on the 
unknown spatial extent of the signal, e.g. the uniform distribution on (0, 1). It is readily seen that 
this analysis leads to the same conclusions as the case of large scales above, i.e. the scan is far 
from optimal. 

The next theorem details the performance of the average likelihood ratio: 

Theorem 2 Let T n be the (1 — a) quantile of the null distribution of A n . 

1. A n is optimal for detecting signals with large spatial extent: 

If 'liminf n \I n \ > and |// n | = -7= with b n — > 00, then JPj n (A n > r n ) — > 1. 

2. A n is not optimal for detecting signals with small spatial extent: 

' a. 



If\I n \ -» and \(j, n \J\In\ = K\ —^- with K < \fi, then lim n P /n (A n > r n ) < 



3. If K > \/A + e n above, where e n J\og tj-t — > oo, then Wf n (A n > r„) — > 1 



Comparing with (|2|) one see that on small scales the ALR requires \p, n \ \/\I n \ to be about 
\[2 times larger than the optimal threshold. This discrepancy is not very consequential: The 
simulations in Section [4] show that the corresponding loss of power is quite small for sample sizes 
up to n = 10000, which is the largest sample size we were able to simulate due to the 0(n 2 ) 
computational complexity of the ALR. 



A heuristic explanation of why the scan and the ALR do not obtain optimality is as follows: 
There are n disjoint intervals / of length 1/n. The corresponding likelihood ratio statistics Y n (I) 



are i.i.d. N(0,1) under the null hypothesis, thus their maximum behaves like y/2 log n. But in 
the case of large intervals of length 1/c (say), there are only c disjoint intervals that result in 
independent statistics Y n {I). The statistics for the other intervals of length 1/c are dependent 
with these Y n (I) since the intervals overlap. Thus the null distribution of that maximum behaves 
roughly like the maximum of c i.i.d. N(0,1), which is O p (l). Hence the overall maximum M n is 
dominated by the small intervals, with a corresponding loss of power at large intervals. 

As for the ALR, if a detectable signal lives on a large interval I n , then Y n (I) will be significant 
provided I has a nonvanishing overlap with I n . Since there are ~ n 2 such intervals, the ALR will 
be significant despite the divisor n 2 in its definition. In the case of small intervals I n , however, 
the number of intervals I that yield a sufficiently large statistic Y n (I) is so small compared to 
the total number of intervals (~ n 2 ) that their contribution to A n is annihilated by the divisor n 2 . 
More precisely: The likelihood ratio statistic is maximized at I = I n , where its size is 3> |in| 
(up to log terms) for signals at the detection threshold (fTj). Thus if \I n \ = 1/n, then there are 
only a few significant likelihood ratios and their magnitude is about | J n | = n - Thus dividing by 
n 2 will let their contribution vanish unless the size of the likelihood ratio statistics is increased to 
|/ n |~ 2 = n 2 by doubling |/i n | 2 |I n | in the log likelihood ratio. 

3 The condensed average likelihood ratio statistic 

The above heuristic suggests that an optimal version of the ALR may be constructed by averaging 
the likelihood ratios not over all ~ n 2 intervals of J n but over a subset of J n with cardinality 
close to n. The general idea of why such an approximation is feasible is as follows: For larger 
intervals, there is not much lost by considering only intervals with endpoints on a coarser grid 
as long as the distance between such gridpoints is small compared to the length of the intervals. 
Then these intervals will still provide a good approximation to J n , while the cardinality of this 
approximating set can be reduced dramatically. To implement this idea, we modify the approach in 
Walther (2010) and Rufibach and Walther (2010) and group intervals into £ max = [log 2 j-^— ] sets, 
each of which contains intervals having about the same length: The approximating set I app (£) 
consists of intervals that contain between mi + 1 and 2rri£ design points and whose endpoints are 
restricted to a grid consisting of every c^fh design point, where m,£ = n2 and di = v lo - 



Our overall approximating set is then the union of these I app (£) together with all small intervals^ 

"max 

■Eapp = \^J -L-app\") U -L-smalh Where 

(-,-] Ej n :j,k€ {id e ,i = 0,1,...} and m e <k-j< 2m e \, 
n n J 



n n 

\d k 

-*• small 

n n 



{^ n ^^:k-j<m e _}. 



We suppress the dependence on n for notational simplicity. Our condensed ALR is thus 



An^cond '■— „_ / _, ex P( 



it-Lapp , _ 

The above choice of di and ma results in statistical and computational efficiency for the ALR and 
is different from the choices used in Walther (2010) and Rufibach and Walther (2010). We give an 
explanation for this choice in the proof of Theorem [3] 

Theorem 3 The condensed ALR A nfionc i is optimal for detecting signals with arbitrary spatial 
extent, i.e. it has asymptotic power 1 against signals f n satisfying ([71). Furthermore, A ncona < can 
be computed in 0(n log n) time. 

4 A simulation study 

Since the results of the previous two sections are asymptotic, we illustrate the performance of 
the scan and the ALR in a finite sample context with a simulation study. For a more informative 
comparison, we also include in our study modifications of the scan aimed at increasing its power 
that have recently been proposed in the literature. 

One such modification is the blocked scan, see Walther (2010). It defines the £\h block as 
comprising all intervals that contain between me = n2~ e and 2rri£ design points. Then one 
assigns different critical values to different blocks such that the significance level on the £th block 
decreases as ~ l~ 2 . 



1 In the density setting of Walther (20 1 0) and Rufibach and Walther (20 1 0) there is no need to consider intervals with 
less than 0(log n) observations. But in the current regression setting we can and do also consider optimal detection of 
signals supported on small intervals containing as little as one design point. For this reason we include I 3ma u m our 
approximating set. 



In more detail, for me, and 



as above define 



M, 



ni .= max 

mi<k—j<mi_i 



j k 



n n 



and M nrimax+1 := maxfc_j< mWo \Y n ((i, |])|. Thus the (£ max + l)st block comprises all small 
intervals that contain up to rri£ max design points. The blocked scan declares that a signal is present 

"'" ' ( * ^" "'* " + 1 }- H ^ft(pft F )isthe(l- p |^) 

such that the overall 



e{i,. 



if M Hii > qi((jf^) for any i t \l, . . . ,i max - ij-. nere q e i ( t+£)2 i 
quantile of the null distribution of M n ^, A := 10 (say), and d is chosen 



significance level is a: 



"max i -L 

Po( IJ {^„ 



l > qe 



e=i 



a 



(A+ty 



}) 



ex. 



The critical values qi and 5 can be easily simulated with Monte Carlo similarly as in Rufibach and 
Walther (2010). We suppress the dependence of qi on n for notational simplicity. 

For the second modification of the scan we employ the penalty term introduced by Diimbgen 
and Spokoiny (2001) in the context of inference about a function. This penalization method can be 
readily adapted for use with the scan statistic in the Gaussian regression setting considered here. 
The idea is to subtract off the putative maximum at each scale in order to put the different scales 
on an equal footing: The penalized scan is 



max 

0<j<k<n 



j k 



n n 



2 log 



en 



k 



Both the penalized scan and the blocked scan aim at improving the power of the scan by fixing 
the miscalibration across different scales described in Section [2] A drawback of the penalized 
scan is that it requires the specification of the penalty term, which has to be derived for each 
particular situation at hand. The form of the penalty term depends on the tail behavior of the 
local test statistics, their dependence structure, and the entropy of the underlying space. Thus 
these properties have to be derived on a case-by-case basis and this derivation is typically far from 
straightforward, while the block method has the advantage that it provides a general recipe that 
does not require any case-specific input. 

Computationally efficient algorithms for evaluating the scan or an approximation thereof have 
been introduced in the literature, see e.g. Neill and Moore (2004), Arias-Castro et al. (2005), 
Rufibach and Walther (2010) and Walther (2010). Unlike the case of the condensed ALR, where 
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the particular choice of the approximating set leads to optimal power properties, it appears that 
evaluating the scan on an appropriate approximating set does not lead to optimal detection by 
itself. Rather, it appears that optimal detection requires the use of scale-dependent critical values, 
and efficient computation has to be addressed separately using any of the methods cited above. 

In our simulation study we first consider signals f n with fixed norm ||/ n ||:= |/U n |-v/|I n | but 
varying spatial extent. Table [T] gives the power of the scan, the ALR, the condensed ALR, the 
penalized scan, and the blocked scan for a sample size of n = 10000. The results are visualized 
in the left plot in Figure [T] One sees that the overall performance of the scan is inferior to that of 
the other four methods, whose performances are quite similar. In particular, the power of the scan 
is not increasing with the spatial extent of the signal as opposed to the other four methods. As a 
consequence, the scan is competitive only for signals on the smallest scales. 
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Figure 1: Left: Power of the scan and the ALR for detecting signals /„ with fixed norm ||/ n ||= 
0.04 but varying spatial extent \I n \, n = 10000. Right: Power of the scan and the ALR for 
detecting signals f n with varying norms ||/„|| and random spatial extent. The power curves for the 
condensed ALR, the penalized scan, and the blocked scan are similar to that of the ALR and are 
not plotted, see Tables [T] and |2j 

Table [T] also shows an improvement in power of the condensed ALR vis-a-vis the ALR on 
small scales, illustrating Theorems [2] and [3] However, this improvement is modest, at least for the 
sample size under consideration, and thus the main advantage of the condensed scan is arguably 
the dramatic reduction in computation time to 0(n log 2 n) versus 0(n 2 ) for the ALR. We were 
able to accurately simulate critical values for the condensed ALR with a sample size of 1 million 
in a matter of hours, whereas this computation would take hundreds of days for the ALR. 

Table |2] shows how the power varies as function of ||/ n ||, see the right plot in Figure [I] for a 



visual representation. The spatial extent of the signal was chosen uniformly in [0, 1] in each of the 
2000 Monte Carlo simulations. The power curves of the last four methods are again quite similar, 
and superior to that of the scan. One sees that the scan requires a signal with almost twice the 
norm to achieve the power of the four other methods. According to the results in the previous 
sections, this discrepancy will increase with the sample size. 
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Table 1 : Power in percent for detecting signals f n with fixed norm ||/ n [| = 0.04 but varying spatial 
extent \I n \,n= 10000. 



xlOO 2 2.5 3 3.5 4 4.5 



scan 7 9 15 24 39 57 74 

ALR 30 45 61 75 88 94 97 

condensed ALR 30 44 60 75 87 94 97 

penalized scan 26 40 57 74 85 93 97 

blocked scan 24 35 51 69 82 92 96 



Table 2: Power in percent for detecting signals f n with varying norms ||/ n || and random spatial 
extent, n = 10000. 

All power values in Tables[T]and[2]are with respect to a 5% significance level. The correspond- 
ing critical values were simulated with 10000 Monte Carlo samples, and the power was simulated 
with 2000 Monte Carlo samples. The location of the signal was chosen at random in each of these 
simulations to avoid confounding the results with the approximation scheme of the condensed 
ALR. 

5 Conclusion 

The scan is optimal only for detecting signals on the smallest scales. The ALR has a superior 
overall performance and is optimal for detecting signals on all scales except on the smallest ones, 
but the loss of power there appears to be modest. Moreover, by averaging the likelihood ratios over 
a particular subset of intervals rather than over all intervals, the resulting condensed ALR is simul- 
taneously optimal for all scales and also allows for efficient computation. In contrast, improved 
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versions of the scan, such as the penalized scan and the blocked scan, appear to require the use 
of scale-dependent critical values, and thus it appears that statistical optimality and computational 
efficiency have to be addressed separately for the scan. 

The results of this paper are developed in the Gaussian white noise model since it is known 
that the conceptual results in that model are applicable and relevant for a wide range of related 
problems. We note that the concrete implementation of the results derived in the Gaussian white 
noise model requires additional work that depends on the concrete problem at hand. For example, 
in the univariate regression setting Rohde (2008) employs local signed rank tests to transform non- 
Gaussian data into statistics with sub-gaussian tails, and Cai et al. (2011) employ a local median 
transformation for the same purpose. To see why such an additional step is required, note that 
the null distribution of both the scan and the ALR depends sensitively on the tails of the error 
distribution, and hence on the assumption of Gaussianity. The above papers show rigorously that 
the Gaussian white noise model is applicable after the local signed rank or the local median trans- 
formation, assuming only e.g. symmetry of the error distribution. These arguments are technically 
sophisticated and thus the transformation step constitutes a piece of methodological work by it- 
self. It is thus helpful to separate the conceptual issues involving the scan and the ALR from the 
particular implementation and to present an unencumbered exposition in the Gaussian white noise 
model. In particular, the heuristics developed in this model give guidance how the corresponding 
problems might be addressed in related detection problems, such as the detection of clusters in 
models involving densities and intensities, as well as the important case of detecting multivariate 
signals. 

6 Proofs 

Note that f n (x) = fi n li n (x) implies for any interval I € J n : 
(4) Y n (I) = Z n (I) + sign(nn)y/n\nn\ 



We will use the following consequence of a result of Dumbgen and Spokoiny (2001): 
Lemma 1 Let £ £ (0, 1) and J G J n , where J does not depend on Z n . Then 



max \Z n {I)\ < L+ W21og^P 
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for a universal random variable L which is finite almost surely. 

Proof of Lemma [if Writing W for Brownian motion and j, k for integer indices: 



e \ J \\A _ f\W(k)-W(j)\ L,e\J\n 



max [\Z n (I)\ - W21og — — )= max W21og- : 

IeJ n :ICJ\ V J / 0<j<fc<n|J|V v/fc- 7 V fe-1 



#'|W(t)-W(a)| / , el Jin 
( 5 ) < sup J — w , Wl - \ 2 log 



0<s<t<n\J\ 



d /\W(t)-W(s)\ n ~ \ 

= sup ( ' K) WI - A /21og- =: J 

0<s<i<l V \ft — S V C — S/ 

by Brownian scaling. Thus the random variable L defined above is universally applicable for all 
n, £ and J. Importantly, L is finite almost surely, see Sec. 6.1 in Diimbgen and Spokoiny (2001). 
□ 

Proof of TheoremjlJ As for parti, Q implies M n > \Y n (I n )\ > -\Z n (I n )\ + ^2logn + 
e^ylogn. Since Z n (I n ) ~ N(0, 1) and e^ylogn — > oo, the claim follows from K n = y / 2k>gn+ 
0(1), see <|6). 

For the proof of part 2, set 6„ := e n -v/logn — > oo and consider first the collection of intervals 
Jn,\ •■= {i € J n : in J n 7^ and |J n |/6 n < |J| < &„|J„|}. So I E J n>1 implies 7 C 

n (0, 1] and thus Lemma 1 gives 



jn-[b n n\I n \i k n +[b n n\I„\ 
n ' n 



max 
ieJn 



<\Z n (I)\ < L + i/21og e(1 ,'ty fn| < L + 2v^gT3M. 

,1 y l-'ral/On 

Together with fl and | J n I n \ < y/\I\\I n \ this yields 



F/„ ( max |y n (I)| > K n ) < P( max |Z n (J)| + i/21ogn - 6 n > «, 

< IP f L > k„ - ^2 log n + 6n - 2 v / log(36 n ) 



since K n = ^2 log n + 0(1) by ([6]). 



Next, only if b n < log 3 n do we need to consider J n ^ '■= \ I £ J n '■ I Tl I n ^ and either 
|/ n |/log 3 n < |7| < |J n |/&n or b n \I n \ < \I\ < \I n \ log 3 n >. Similarly as above, I € J n ^ implies 
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that I is contained in an interval J G J n with | J\ < (1 + 21og 3 n)|I n |. Thus Lemma 111 yields 



max|Z n (/)| < L + J2lD8 e(1 ,^ / yr )|fwl ^ L + V^^- 
/ej», 2 V |/ n |/log- 5 n 



One readily checks that 7 € J n $ implies \I D I n \ < y/\I\\I n \/b n . Thus (Wl) gives 
P /n ( max \Y n (I)\ > K n ) < Pf max |Z n (J)| + V 21 ^" 6 " > Kr 



ZZ I — '•' \ / I I It § 1 ~ | I h \ / \ • I-, 

I&Jn,2 ' V'GJn,2 vOr, 



.J rs \/2 log n - b n f — 

< P \L > K n = 4 Vlog log n 

V V»n 

— > since K n = a/2 log n + O(l). 
Finally, consider Jn^ := j/6 J n ■ In J„ = or |/| < |/ n |/log 3 n or |J| > |/ n |log 3 n|. 



Since I e J7n,3 implies 1 1 n I n | < \/|I||I n |/log n, we get by (4l: 

P/J max |y n (/)| > k„) < p(max|Z n (/)| > K n - V 21 ^^ 6 ^ ^ Qj 
v/e,7n,3 / \iejn vlog n 

where the convergence follows from the following fact: 

A sequence {c n } satisfies lim„ P ( max/ e j n \Z n (I)\ > c n ) = a if and only if 



(6) 



c n = y^Aogn + {2logn)- 1/2 ( -loglogn + C(a)) +o((logn) 1/2 ) 



for a certain constant C(a). This fact follows from Theorem 1.3 in Kabluchko (2008) or with 
some work from the earlier Theorem 1 in Siegmund and Venkatraman (1995). 

Since J n = J n ^\ U J n ^ U J n ,z the theorem is proved. □ 

Proof of Theorem [2j We begin by showing that in the null case of no signal, i.e. Y n = Z n , 
we have 

(7) A n = O p (l). 

For m > define the event B m , n := {\Z n ((°-, £])| < C(^) + m for all < j < k < n}, 
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where C{8) := \/2\ogej6. Then Markov's inequality gives for A > 



1 



72—1 n 



Po(A n >A)<^^ £ E[e X p( (Z " (( ^" ])) )l(B m , w ; 

i=0 k=j+l 



+nBkn) 



n—1 n 



< 



T,2 Z^ Z^ 



An 2 



C(^)+m 



72—1 27 



•■"-L e -'%+PKJ 



j=0 k=j+l v J 

- -. 72—1 „1 

<-(-V/ V / 21oge/n,in + m)+P(^ n ) 



< 



3=0 

4 + 771 



A 



+ P(L >m) 



by ([5]). This sum can be made arbitrarily small by choosing m and A appropriately, proving Q. 

To prove parts 1 and 3 together we consider f n with arbitrary spatial extent and | fx n \ \J\I n \ > 
(J 4 log jj^\ + bnJ/Vn with b n -*■ oo. Set e n := min(l, 6„(log j^f)~ 1/2 ) and J"(I n ) := {I G 
J n : I C I n and |/| > |/ n |(l - e„/2)}. Then #J(J„) > re n n|I n |/4l 2 since each of the 
|"e n n|I n |/4] smallest (largest) design points in cl(I n ) may serve as a left (right) endpoint for some 
I G J {I n )- Lemma [I] gives 



max \Z n (I)\ < L+j21og—?—^ 



< L + 2. 



Together with 'f 17 " 1 



ro > V 1 ~ ¥ and 1 4 ! 1 we § et: 



min |y n (I)| > (.L] og -L. + b n )Jl-5L-L-2 

iej(in) vy j n /y 2 



a iKib + T- £ - 2 
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since (x + y)y/l -min(l,y)/2 > x + § for x € [0, 2], y > 0. Thus, writing R n :=^f - L-2, 



, ^ jJVn) . ({Y n {I)f 

A n > t, — mm exp' 

n 2 ieJ(in) 



> £ -^- exp{2log-j- + R n (R n /2 + J41og-^)} 

= J exp{i?„(i?„/2+ J41og-^)} 

^Iflog^Wit^lti^l) 

^4' oo since R n ^4' oo and e n , /log - — - > 1 eventually. 

V -*« 



The claim follows since r n = O(l) by dTl). 

For the proof of part 2 we partition J n into J n% \ := {I G J7^ : Ir\I n / and |/ n |/log tj-t < 
|/| < |I„| log 4 ^} and J n>2 := J n n J£ fl . We will show for J = J n>1 ,J ni2 : 



(^n(/)) 2 \ _f(Z n (I))-\ P 



iej 



Since it can be shown that in the null case /„ = the ALR A n converges weakly to a continuous 
limit, the claim of part 2 follows from ([8]). 

To prove (8) for ^7 = J n ^\ we follow the proof of part 2 of Theorem Ml (set b n := log 4 jj-r 



there) and conclude max/ g j n l \Z n (I)\ < L + 2« /log(31og tj-t). Hence for a fixed constant A 
which will be specified below we obtain: 



]p(An := {max \Z n (I)\ < \Jlog-^\) -> 1. 

V ^l£Jn,l V \J-n\ J ' 

That proof also shows that every I G J7" n ,i is contained in a certain interval of length (1 
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21og 4 1 f T )|/ ri |,thus#J n ,i < ((l + 21og 4 TjMn|/ n |) . On the event A n we have by (4 1 



1 V" I 



(Yn(I))' 



exp 



|J»|< 



[Zn{I)f 



iej„ 



2 ) r \ 2 

\2 



< 5— 2 exp max 1 \- max \Z n (I) \y/n \I n /x„ 

n z UeJk.i 2 2 /e.7n,i / 

A 2 lo gR ^ _ K \ 1 



£ is(l g— ) |/ n | 2 exp 

\J-n\ ' 
18(log— ) |/„l 2 -^-^- x - 

J 77 



+ ^T log nm + Ain ° g ^m) 



Since ^- < 2 we can choose A = A (-PC) > such that the above expression goes to as \I n \ — > 0, 
proving (|8]> for J" = J n> \. 

To prove ([8]) for J = ^2 we proceed similarly as in the proof of (FTl) and employ the event 
B m>n defined there. Then for A, m > Markov's inequality gives 



W U{~2 E 



exp 



iej n ,3 



(Y n (I)f 



exp 



(^W)' 



> A 



< 



A^ ? E/ " 



/ej"„, ; 



exp 



T»(i)) S 



exp 



{Zn{I)f 



l(B m , n ) )+W(BL n ). 



Since (pi) gives P(S^ n ) < IP(L > m) — )■ as m — > 00, it is enough to show that for any fixed m 
the above expectation converges to as n — >• 00, uniformly in / G J7" n ,2- Using Z n (I) ~ N(0,1) 
and writing <5j := V^nl^r^ and C(|J|) := ^/2 log e/|T|, we get with (4-1: 



(9) 



< 
< 



exp 
1 



(r n (I)) 2 ^ /(Z„(/)) 2 



2 

C(|/|)+m 



2vr 7_c([j|)-m 

7T 



l(Bm 

f/2 



exp(zSi + 5|/2) - 1 
expf 5j(C(\I\) + m) + 8J/2) fc(C7(|J|) + m) 2 + <5 2 (C(|/|) + m)/2 



by bounding the function z i-> exp(z<5 + <5 2 /2) — 1 above and below with the mean value theorem. 
Next we will show that as n — > 00 



(10) 



61 -»■ and 5/C 2 (|/|) -»• uniformly in J G Jn,2- 



This conclusion then also holds for the expression in (|9]), and ([8]> follows. 



16 



To prove ( 10), note that 5/ = Kjlogjh -^hL. If I n I n = then 5i = 0. If III < 
|/ n |/ log 4 |j-r , then the bound \I n I n | < |/| yields 5/ < -K"(log rrr ) 2 > while the monotonicity 
of the function x \-t y^loge/x for x G (0, e _1 ) gives 



6jC 2 (\I\) < 21^/log-L/^log^- 



< 


2^(log-^Y 


log . | 


< 


6if(log-^Y 


-1/2 



if n is large enough so that | I n \ / log 4 tj-t < e x . 

If \I\ > \I n \ log 4 r^r, then the bound \I n J n | < |/„| yields again 5i < if (log ro) _3//2 > while 
<^ 2 (|/|) < 2K(log ^)" 3 / 2 log fl < 2Jf (log ^)- 1 /2. □ 

Proof of Theorem [3J Before proceeding to the proof we sketch an explanation for the choice 
of the grid spacing di. For given I n , let £ be such that the intervals in I app (£) have length about 
\I n \. Thus mi ~ n\I n \, i.e. ^ rs log 2 rpr- An interval I will result in a significant likelihood 
ratio provided its endpoints lie in a ^f\I n \ neighborhood of the endpoints of I n , where e n := 
mini 1, 6 n (log \f~\)~ l )■ Thus the number of significant intervals inl app (£) is ~ (e n \I n \n/de) 2 . 

n ( lo S2 TTlY 

Under ( 1 ) the size of the corresponding likelihood ratios is 3> rj-F^ — for arbitrary p > 0. 

Thus optimality of A nfion d will obtain if the number of significant intervals in I app {£) is at least 
#X app (lo ' n i -, p - for some fixed p > 0, and even if the number is smaller by some factor e^ since 

^ ° 2 \In\ ' 



e n . /log 2 rpr > 1. Solving this inequality for di yields di < J n J x n ' — . Requiring #X app ~ 
n(log n) q for some q > for computational efficiency suggests the choice di ~ \fmi£P(\ogn)~i. 
Computing #X app shows that this choice of di is indeed consistent with #X app ~ ra(logn) 9 
provided p > 1. Further, it will be seen that A n ^ con d = O p (l) under the null hypothesis requires 
^2g£ l ' 2 ~ p < oo, i.e. p > 3/2. Finally, optimal detection for very small intervals I n requires that 
their endpoints are approximated exactly, i.e. it is necessary to have di = 1 for large £. Thus we 
need an appropriate combination of a small p > 3/2 and a large q. Since the required large q 
results in a noticeably worse computation time, we prefer to stick to a 0(n log 2 n) algorithm by 
setting p = 8/5, q = 2, and by explicitly considering all small intervals containing up to log n 
design points in lieu of choosing a larger q. 

We now prove the theorem, starting with the claim about the computational complexity. Since 
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~app\ 



allows only every e^th design point as a potential endpoint for an interval, there are at 



most \n/d{\ left endpoints. For each left endpoint there are at most \2ragjd(\ right endpoints 
since each interval contains not more than 2rri£ design points. Thus 



(ID 
(12) 

(13) 



#I app (t) < 



n 

dp 



2me~\ nlog n 



df 



< 3 



£8/5 



and 



#Z 'small < nlogn since me max < log n. Hence 

jit s- V^ o nl °g n , i / n l 2 

it-Lapp < y^ — ^sTS r nlogn < yniog n. 



=1 



£8/5 



exp ( ^ ";, " J can be evaluated in a constant number of steps after an initial one-time computation 
of the cumulative sum vector of (1^, 1 < i < n). Since that computation has complexity 0(n), the 
overall computational complexity of computing A njCon d is dominated by the cardinality of X app 
and hence is 0(nlog 2 n). 

Next we show that for f n = 0: 



(14) 



A 



n,cond 



0„(1) as n — > oo. 



Proceeding as in the proof of ([7]) it is enough to show that 



(15) 



— £ v 21 ° g m 



ifl-app I( _ x 



O(l) 



Since J G X app (£) implies |7| > ^ = 2 , we obtain with ( 11 1 and ( 12 1: 



J2 J 21 °Sm < ^2^ app (£)V2i + n(logn) V / 2bg~ 



/?. 



iela 



£1/2 



< 5nlog 2 n^ — — + ra(log 



n 



< 56nlog n. 



On the other hand, #T app > #Z app (2) > y^n(logn) 2 by considerations similar to those estab- 



lishing (y_l]>. ([15]) follows. 

To establish optimality of A„ con ^ we proceed as in the proof of Theorem [2] and consider f n 
with arbitrary spatial extent and |/x n |yjinj > (-./21ogrpr + b n ) / ' yjn where b n — > oo. As 
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before we define e n := mini 1, b n (log \j—\)~ 1 ^ 2 ) an d J{fn) '■= {I 6 J n '■ I C I n an d |/| > 
|/ n |(l — e n /2)}. Intervals / G J{I n ) will contribute significant LR statistics to A n>con d. Since 
we now require that the endpoints of these intervals fall on a df grid, there are now many fewer 
of these intervals. But this is more than compensated by the small cardinality of X app appearing 
in the divisor of A n:Cona : This fact allows us to detect f n with a norm that is smaller than in 
Theorem[2] In more detail: We define the integer £ by m? < n\I n \ (1 — e n /4) < 2m,£ and establish 
below: 



Lemma 2 



Iff. < 



If I > 



#(j(i n )nx app (e) y 

"max? ttien — = - > 



#x, 



"/'/' 



#{J{In) Hi small 
», ?/z£« > 



#z, 



app 



e 2 \I I 

t n\ 1 n\ 



93 { l0 ^\t\) 



\ 8/5' 



e 2 II I 



322 log e 



2 ' 



As in the proof of Theorem 



hk 



we find mm 7eJ(/n) |Y" n (-OI > J2 log ^ + i? n , where i? n 



L — 2. Thus in the case £ < £ ma x Lemma 2 gives 



A 



n,cond 



> 



#[ J (l n ) nl app (£) 



d 
> 



#1, 

e 2 \I I 



app 



mm exp 

It J (In) 



'X n {I)f 



^^\t\ 



93 ' lo g 2 ^ 



8/5 



8/5 



expjlog -^ + R n (Rn/2 + ,/21og -L)} 



exp{i?n (i? n /2 + J 2 log -M } 



> Ce;(log — -j-1 exp(i^/2) l(i? n > 1) for some universal C > 
^4' oo since R n ^4' oo and e n . /log — — r > 1 eventually. 



In the case £ > £ max the same conclusion obtains by using I small in place of I app (£). The claim 



then follows since the critical value of A ncon ^ stays bounded by ( 14 ) 



Thus the crucial difference to Theorem [2] is the stronger inequality provided by Lemma [2] 
The corresponding inequality for TheoremSis #J(I n )/n 2 > e n \I n \ 2 /4 2 , and the extra term \I n \ 
causes the loss of efficiency in the case where the spatial extent | I n | becomes small. 
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It remains to prove Lemma [2] Elementary considerations show that one can find sets In,uft 
and Tn t ri g hu eacn consisting of p := \e n n\I n \/8] consecutive integers, such that (j, k) G I n ,ieft x 
Insight implies (£, |] G J(I n ) and also m e < k - j < 2m £ if £ < ^ ma:c , resp. k-j< me max if 
£ > f mai . Thus in the latter case we immediately obtain # ( J{I n ) n X sma li ) > P 2 and the claim 



of the Lemma obtains with ( 13 1, n\ I n \ > 1, and log n < | log 2 r/-r, which follows from £ > ^ ma x- 
In the case £ < £ max , only a subset of the p 2 intervals j ( i , |] : ( j, /s) G ln,ieft x l n ,ri g ht \ belongs 
to J(I n ) n X app (£), namely those for which both j and k lie on the c^-grid. The number of such 
indices j is at least [j^J > [" — q L/5° gn l f° r ra large enough, where this inequality follows 
from the fact that ^ < f mm implies ra|I n | > rri£ > (logn)/2 and d^ > (logn) 3 / 10 , further 
e n \/Iogn — )• oo by the definition of e n . The same bound obtains for the number of indices k that 



lie on the <a^-grid. The claim of the Lemma then follows with ( 13 1 and £ < log 2 rf—,, which is a 



consequence of the definition of £. □ 
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